Small Cell Lung Cancer

El Mehdi Baknine, s194533
Jakob Frostholm Højgaard, s194527
Jonathan Dragestad Møller, s184243
Mikkel Niklas Rasmussen, s193518
Thomas Malthe Mølgaard Tams, s204540

2023-11-28

Introduction

Paper and data source

Title: Comprehensive genomic profiles of small cell lung cancer, George J. et. al. (2015)

Loading:
- (81, 31669)
- 30 metadata
- 31639 gene expression

Data clean:
- Check duplicate IDs
- Clean weird variables
- Check NAs

Data augment:
- 33 metadata
- 400 transcripts

Purpose: Identify different small cell lung cancer profiles

Methods

Load, clean and augment

  • Load in data from two different sheets in an excel file and combine these into a single file.
  • Clean the data by creating usable column names and check that NAs exists.
  • Augment 3 new variables:
    Survival status - Dead/alive - Treatment type

Methods

Analysis specific methods

  • Select transcripts of interest via Kmeans clustering
  • Heatmapping of expression values - Data exploration
  • Hierarchical clustering of samples - Two groups
  • PCA - Check metadata and identify possible transcripts
  • Logistic regression - Statistically identify transcripts of interest in each group

Overview of metadata

Overview of metadata

Results

PCA

PCA2

Conclusion

Loading

  • Dimensions:

Cleaning

  • New dimensions: 81, 31669

  • Check for duplicates in SampleIDs

  • Clean weird variables

  • Check NAs

Augmenting

  • New dimensions: 81, 31772